Word-Sense Classification by Hierarchical Clustering

نویسندگان

  • Ken Y. K. Lau
  • Robert Wing Pong Luk
چکیده

This paper investigates the use of clustering techniques in word-sense classification, which identifies different contexts that a word was used with the same or similar sense. For simplicity, we have used the hierarchical clustering techniques: single-and complete-linkage, and we showed that the latter is a more suitable technique from our performance measurements (i. e. recall and precision) compared with manually grouping different contexts of similar meaning. We found that the use of part-of-speech tags and fixed-length context has better clustering performance than without part-of-speech tags and sentence context, respectively. The differences between manually identified groups of different contexts are measured in terms of recall and precision at about 80%, which are not very different from the average recall and precision performance of complete-linkage clustering at 80% and 75%, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Word Sense Induction based on Hierarchical Clustering Algorithm

Sense induction seeks to automatically identify word senses of polysemous words encountered in a corpus. Unsupervised word sense induction can be viewed as a clustering problem. In this paper, we used the Hierarchical Clustering Algorithm as the classifier for word sense induction. Experiments show the system can achieve 72% F-score about train-corpus and 65% F-score about test-corpus.

متن کامل

Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional “bag of words” representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambig...

متن کامل

Word Sense Induction Disambiguation Using Hierarchical Random Graphs

Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and others. Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. Recent studies suggest that graphs often exhibit a hierarchi...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

Clustering Paraphrases by Word Sense

Automatically generated databases of English paraphrases have the drawback that they return a single list of paraphrases for an input word or phrase. This means that all senses of polysemous words are grouped together, unlike WordNet which partitions different senses into separate synsets. We present a new method for clustering paraphrases by word sense, and apply it to the Paraphrase Database ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998